Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Add arm64 encodings for IF_SVE_CL_3A to IF_SVE_CU_3A #95514

Merged
merged 3 commits into from
Dec 4, 2023

Conversation

a74nh
Copy link
Contributor

@a74nh a74nh commented Dec 1, 2023

The next batch.

MOV is an alias for CPY. Due to the scripting there are two different opcodes even though they have the same encodings. I think that's fine. I change CPY to the preferred MOV in emit_R_R_R()

For some of these I needed INS_OPTS_SCALABLE_n_FROM_SIMD_(SCALAR/VECTOR). But, doing that increases insOpts to 7 bits, which breaks the hardcoded sizings in one of the structures. Instead, I repurposed INS_OPTS_SCALABLE_n_TO_SIMD_(SCALAR/VECTOR) to INS_OPTS_SCALABLE_n_WITH_SIMD_(SCALAR/VECTOR). I don't think we ever need to distinguish between the two.

Fixed two of the perf scores for other instructions.

The use of insOpts and emitAttr seems to have resolved into:

  • If there are no fixed sized (GP or NEON) registers in the instruction then emitAttr is EA_SCALABLE.
  • If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes) register (8 bytes).
  • If there is a fixed sized general or FP register in the instruction, then emitAttr is the size of the the register (4,8 bytes for general, 1,2,4,8 bytes for FP).
  • insOpts states the size of the scalable register elements.
  • insOpts states if there are additional register types in the instruction (but not the size of those additional types).

EDIT: Fixed vector size above.

@dotnet-issue-labeler dotnet-issue-labeler bot added the area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI label Dec 1, 2023
@ghost ghost added the community-contribution Indicates that the PR has been added by a community member label Dec 1, 2023
@ghost
Copy link

ghost commented Dec 1, 2023

Tagging subscribers to this area: @JulieLeeMSFT, @jakobbotsch
See info in area-owners.md if you want to be subscribed.

Issue Details

The next batch.

MOV is an alias for CPY. Due to the scripting there are two different opcodes even though they have the same encodings. I think that's fine. I change CPY to the preferred MOV in emit_R_R_R()

For some of these I needed INS_OPTS_SCALABLE_n_FROM_SIMD_(SCALAR/VECTOR). But, doing that increases insOpts to 7 bits, which breaks the hardcoded sizings in one of the structures. Instead, I repurposed INS_OPTS_SCALABLE_n_TO_SIMD_(SCALAR/VECTOR) to INS_OPTS_SCALABLE_n_WITH_SIMD_(SCALAR/VECTOR). I don't think we ever need to distinguish between the two.

Fixed two of the perf scores for other instructions.

The use of insOpts and emitAttr seems to have resolved into:

  • If there are no fixed sized (GP or NEON) registers in the instruction then emitAttr is EA_SCALABLE.
  • If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).
  • If there is a fixed sized general or FP register in the instruction, then emitAttr is the size of the the register (4,8 bytes for general, 1,2,4,8 bytes for FP).
  • insOpts states the size of the scalable register elements.
  • insOpts states if there are additional register types in the instruction (but not the size of those additional types).
Author: a74nh
Assignees: -
Labels:

area-CodeGen-coreclr, community-contribution

Milestone: -

@a74nh
Copy link
Contributor Author

a74nh commented Dec 1, 2023

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

@a74nh a74nh force-pushed the IF_SVE_CL_3A_github branch from a5f514c to b56c574 Compare December 1, 2023 15:05
@a74nh a74nh marked this pull request as ready for review December 1, 2023 15:09
@a74nh
Copy link
Contributor Author

a74nh commented Dec 1, 2023

@BruceForstall
Copy link
Member

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

Isn't it nonsensical to have a use of sp that isn't a full-sized 8-byte use? Wouldn't "wsp" indicate a 4-byte use? Does that ever happen?

@BruceForstall
Copy link
Member

  • If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).

Is this how it is today for non-SVE? (instead of using insOpts for the element size)

Copy link
Member

@kunalspathak kunalspathak left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

src/coreclr/jit/registerarm64.h Outdated Show resolved Hide resolved
@kunalspathak
Copy link
Member

btw, do you mind posting the capstone vs. coreclr disassembly as well?

@kunalspathak kunalspathak added the arm-sve Work related to arm64 SVE/SVE2 support label Dec 2, 2023
@a74nh
Copy link
Contributor Author

a74nh commented Dec 4, 2023

capstone output (identical to clr output):

and z0.b, p1/m, z0.b, z2.b
bic z3.h, p4/m, z3.h, z5.h
eor z14.s, p5/m, z14.s, z16.s
orr z29.d, p7/m, z29.d, z31.d
add z5.b, p6/m, z5.b, z7.b
sub z15.h, p7/m, z15.h, z29.h
subr z2.s, p0/m, z2.s, z13.s
sdiv z3.s, p2/m, z3.s, z9.s
sdivr z31.d, p3/m, z31.d, z29.d
udiv z1.s, p0/m, z1.s, z0.s
udivr z13.d, p7/m, z13.d, z15.d
smax z24.b, p0/m, z24.b, z2.b
smin z9.h, p1/m, z9.h, z27.h
sabd z5.b, p2/m, z5.b, z6.b
uabd z23.s, p3/m, z23.s, z9.s
umax z15.s, p4/m, z15.s, z2.s
umin z12.d, p7/m, z12.d, z0.d
mul z5.d, p1/m, z5.d, z3.d
smulh z17.s, p5/m, z17.s, z5.s
umulh z12.b, p2/m, z12.b, z24.b
asr z5.s, p0/m, z5.s, z21.s
asrr z1.b, p7/m, z1.b, z20.b
lsl z0.h, p2/m, z0.h, z0.h
lslr z27.d, p6/m, z27.d, z31.d
lsr z5.b, p5/m, z5.b, z6.b
lsrr z15.s, p4/m, z15.s, z17.s
asr z4.b, p3/m, z4.b, z24.d
lsl z19.h, p7/m, z19.h, z3.d
lsr z0.s, p0/m, z0.s, z0.d
clasta z31.b, p7, z31.b, z31.b
clastb z30.d, p6, z30.d, z30.d
clasta h12, p1, h12, z15.h
clastb s13, p2, s13, z16.s
clastb d14, p0, d14, z17.d
clasta w0, p0, w0, z0.b
clasta w1, p2, w1, z3.h
clastb w23, p5, w23, z12.s
clastb x3, p6, x3, z9.d
shadd z15.b, p0/m, z15.b, z10.b
shsub z16.h, p1/m, z16.h, z11.h
shsubr z17.s, p2/m, z17.s, z12.s
srhadd z18.d, p3/m, z18.d, z13.d
uhadd z19.b, p4/m, z19.b, z14.b
uhsub z20.h, p5/m, z20.h, z15.h
uhsubr z21.s, p6/m, z21.s, z16.s
urhadd z22.d, p7/m, z22.d, z17.d
addp z23.b, p6/m, z23.b, z18.b
smaxp z24.h, p5/m, z24.h, z19.h
sminp z25.s, p4/m, z25.s, z20.s
umaxp z26.d, p3/m, z26.d, z21.d
uminp z27.b, p2/m, z27.b, z22.b
sqadd z28.b, p1/m, z28.b, z23.b
sqsub z29.h, p0/m, z29.h, z24.h
sqsubr z30.h, p1/m, z30.h, z25.h
suqadd z31.b, p2/m, z31.b, z26.b
uqadd z0.s, p3/m, z0.s, z27.s
uqsub z1.d, p4/m, z1.d, z28.d
uqsubr z2.b, p5/m, z2.b, z29.b
usqadd z3.b, p6/m, z3.b, z30.b
sqrshl z4.b, p7/m, z4.b, z31.b
sqrshlr z5.h, p0/m, z5.h, z30.h
sqshl z6.s, p1/m, z6.s, z29.s
sqshlr z7.d, p2/m, z7.d, z28.d
srshl z8.b, p3/m, z8.b, z27.b
srshlr z9.h, p4/m, z9.h, z26.h
uqrshl z10.s, p5/m, z10.s, z25.s
uqrshlr z11.d, p6/m, z11.d, z24.d
uqshl z12.b, p7/m, z12.b, z23.b
uqshlr z13.h, p0/m, z13.h, z22.h
urshl z14.s, p1/m, z14.s, z21.s
urshlr z15.d, p2/m, z15.d, z20.d
faddp z16.h, p3/m, z16.h, z19.h
fmaxnmp z17.s, p4/m, z17.s, z18.s
fmaxp z18.d, p5/m, z18.d, z17.d
fminnmp z19.s, p6/m, z19.s, z16.s
fminp z20.h, p7/m, z20.h, z15.h
fadda h21, p6, h21, z14.h
fadda s22, p5, s22, z13.s
fadda d23, p4, d23, z12.d
fabd z24.h, p3/m, z24.h, z11.h
fadd z25.s, p2/m, z25.s, z10.s
fdiv z28.s, p0/m, z28.s, z7.s
fdivr z29.d, p1/m, z29.d, z6.d
fmax z30.h, p2/m, z30.h, z5.h
fmaxnm z31.s, p3/m, z31.s, z4.s
fmin z0.d, p4/m, z0.d, z3.d
fminnm z1.h, p5/m, z1.h, z2.h
fmul z2.s, p6/m, z2.s, z1.s
fmulx z3.d, p7/m, z3.d, z0.d
fscale z4.h, p6/m, z4.h, z31.h
fsub z5.s, p5/m, z5.s, z30.s
fsubr z6.d, p4/m, z6.d, z29.d
andv b0, p0, z0.b
eorv h1, p1, z1.h
orv s2, p2, z2.s
orv d3, p3, z3.d
saddv d1, p4, z2.b
saddv d2, p5, z3.h
uaddv d3, p6, z4.s
smaxv d15, p7, z4.d
sminv s16, p6, z14.s
umaxv h17, p5, z24.h
uminv b18, p4, z31.b
cls z31.b, p0/m, z0.b
clz z30.h, p1/m, z1.h
cnot z29.s, p2/m, z2.s
cnt z28.d, p3/m, z3.d
fabs z27.h, p4/m, z4.h
fneg z26.s, p5/m, z5.s
not z25.b, p6/m, z6.b
abs z24.b, p7/m, z7.b
neg z23.s, p0/m, z8.s
sxtb z22.h, p1/m, z9.h
sxtb z22.s, p1/m, z9.s
sxtb z22.d, p1/m, z9.d
sxth z21.s, p2/m, z10.s
sxth z21.d, p2/m, z10.d
sxtw z20.d, p3/m, z11.d
uxtb z19.h, p4/m, z12.h
uxtb z19.s, p4/m, z12.s
uxtb z19.d, p4/m, z12.d
uxth z18.s, p5/m, z13.s
uxth z18.d, p5/m, z13.d
uxtw z17.d, p6/m, z14.d
compact z16.s, p7, z13.s
compact z15.d, p0, z12.d
mov z14.b, p1/m, b11
mov z13.s, p2/m, s10
mov z12.h, p3/m, h9
mov z11.d, p4/m, d8
mov z10.d, p5/m, sp
mov z9.h, p6/m, w30
mov z8.s, p7/m, w29
mov z7.b, p0/m, w28
lasta b6, p1, z27.b
lasta h5, p2, z26.h
lastb s4, p3, z25.s
lastb d3, p4, z24.d
lasta w1, p5, z23.b
lasta w0, p6, z22.s
lastb w30, p7, z21.h
lastb fp, p0, z20.d
rbit z28.h, p1/m, z19.h
rbit z28.b, p1/m, z19.b
rbit z28.s, p1/m, z19.s
rbit z28.d, p1/m, z19.d
revb z27.h, p2/m, z18.h
revb z27.s, p2/m, z18.s
revb z27.d, p2/m, z18.d
revh z26.s, p3/m, z17.s
revh z26.d, p3/m, z17.d
revw z25.d, p4/m, z16.d
nop

@a74nh
Copy link
Contributor Author

a74nh commented Dec 4, 2023

I assume WSP is named with as wsp? because it normally is not used. I renamed it to wsp to match capstone, but am happy to remove and change the test to use SP.

Isn't it nonsensical to have a use of sp that isn't a full-sized 8-byte use? Wouldn't "wsp" indicate a 4-byte use? Does that ever happen?

Yeah, agreed, we would never actually used it. Revered and switched test to 8byte.

@a74nh
Copy link
Contributor Author

a74nh commented Dec 4, 2023

  • If there is a fixed sized vector register in the instruction, then emitAttr is the size of the elements (1,2,4,8 bytes).

Is this how it is today for non-SVE? (instead of using insOpts for the element size)

Ah, no non-SVE uses the vector size. Looking at my code again, everything with a SIMD vector is currently one of the unsupported instructions, which is why I hadn't seen any issues.

Fixed the tests and code so that is uses size of the vector.

@a74nh
Copy link
Contributor Author

a74nh commented Dec 4, 2023

CI failures don't look connected. (This patch should have no effect on any existing tests)

@kunalspathak
Copy link
Member

superpmi failures are unrelated and we are seeing it in other PRs as well.

@kunalspathak kunalspathak merged commit 95738f3 into dotnet:main Dec 4, 2023
123 of 129 checks passed
@a74nh a74nh deleted the IF_SVE_CL_3A_github branch December 4, 2023 15:59
@github-actions github-actions bot locked and limited conversation to collaborators Jan 4, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
area-CodeGen-coreclr CLR JIT compiler in src/coreclr/src/jit and related components such as SuperPMI arm-sve Work related to arm64 SVE/SVE2 support community-contribution Indicates that the PR has been added by a community member
Projects
None yet
Development

Successfully merging this pull request may close these issues.

3 participants